Skip to content

Conversation

@donghao526
Copy link

@donghao526 donghao526 commented Aug 19, 2025

ISSUE

It closes #3063.

Proposed Changes

Add TDIGEST.REVRANK command implementation
Add cpp unit tests

@PragmaTwice PragmaTwice changed the title feat(tdigest): add TDIGEST.Revrank command implementation #3063 feat(tdigest): add the support of TDIGEST.REVRANK command Aug 19, 2025
@PragmaTwice
Copy link
Member

Thank you for your contribution. Could you add some golang test cases for it?

Refer to https://github.com/apache/kvrocks/blob/unstable/tests/gocase/unit/type/tdigest/tdigest_test.go.

@donghao526 donghao526 closed this Aug 19, 2025
@donghao526
Copy link
Author

@PragmaTwice ok,I will add some golang test

@donghao526 donghao526 reopened this Aug 19, 2025
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
39.7% Coverage on New Code (required ≥ 50%)

See analysis details on SonarQube Cloud

Copy link
Member

@LindaSummer LindaSummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

Thanks very much for your contribution! 😊

Left some comments.

Best Regards,
Edward

Comment on lines 193 to 228
{
LockGuard guard(storage_->GetLockManager(), ns_key);

if (auto status = getMetaDataByNsKey(ctx, ns_key, &metadata); !status.ok()) {
return status;
}

if (metadata.total_observations == 0) {
result->resize(inputs.size(), -2);
return rocksdb::Status::OK();
}

if (metadata.unmerged_nodes > 0) {
auto batch = storage_->GetWriteBatchBase();
WriteBatchLogData log_data(kRedisTDigest);
if (auto status = batch->PutLogData(log_data.Encode()); !status.ok()) {
return status;
}

if (auto status = mergeCurrentBuffer(ctx, ns_key, batch, &metadata); !status.ok()) {
return status;
}

std::string metadata_bytes;
metadata.Encode(&metadata_bytes);
if (auto status = batch->Put(metadata_cf_handle_, ns_key, metadata_bytes); !status.ok()) {
return status;
}

if (auto status = storage_->Write(ctx, storage_->DefaultWriteOptions(), batch->GetWriteBatch()); !status.ok()) {
return status;
}

ctx.RefreshLatestSnapshot();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

The merge action could be refactored into a function to reduce duplication of the same logic.

Best Regards,
Edward

Comment on lines 240 to 246
for (auto value : inputs) {
auto status_or_rank = TDigestRevRank(dump_centroids, value);
if (!status_or_rank) {
return rocksdb::Status::InvalidArgument(status_or_rank.Msg());
}
result->push_back(*status_or_rank);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

We could sort the inputs and get the ranks with just one scan of the centroids since it's sorted.

Best Regards,
Edward

Copy link
Author

@donghao526 donghao526 Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LindaSummer
I encountered a problem when I was testing. After the nodes merged, are there two adjacent centroids can be with the same mean?

I Test with

TDIGEST.CREATE s COMPRESSION 1000

TDIGEST.ADD s 10 10 10 10 20 20

I found the centroids after merged are:
(1) mean: 10 weight: 1
(2) mean: 10 weight: 1
(3) mean: 10 weight: 1
(4) mean: 10 weight: 1
(5) mean: 20 weight: 1
(6) mean: 20 weight: 1

Is this as expected or a bug?

Copy link
Member

@LindaSummer LindaSummer Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

It is expected, and you could refer to #2878 for more details.
So we need a stable way for both serialization and deserialization.

The trigger for the merge is the weight, not the mean. So we could treat the mean only as a label of one centroid. The whole logic is driven by weight.

Best Regards,
Edward

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

}

template <typename TD>
inline StatusOr<int> TDigestRevRank(TD&& td, double value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

We need to use a stable way to compare between doubles.

It will be tough to assume that the two double numbers are equal to or greater than.

After solving this, we should add some test cases for this corner case.

Best Regards,
Edward

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

Since the other code snippets use this way now. You could leave it with the current logic.

I will try to create a new PR to solve the unstable comparison problem in this file.

Best Regards,
Edward

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, after your new PR, I can help to fix here.

@donghao526 donghao526 marked this pull request as draft August 20, 2025 14:27
@donghao526
Copy link
Author

@LindaSummer I have modified the code according to your suggestion, please review.

  1. The merge action has been refactored into the mergeNodes function
  2. I have sorted the inputs in TDigestRevRank, now we could get the ranks with just one scan of the centroids

Best Regards

@donghao526 donghao526 requested a review from LindaSummer August 28, 2025 05:29
@donghao526 donghao526 marked this pull request as ready for review August 28, 2025 05:29
Copy link
Member

@LindaSummer LindaSummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

Thanks very much for your great effort! 😊

Left some comments.

Best Regards,
Edward

}

template <typename TD>
inline Status TDigestRank(TD&& td, const std::vector<double>& inputs, std::vector<int>& result) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems specialized for RevRank rather than Rank.

Could we implement the Rank, then wrap the Iterator to a reverse version with the same logic to construct RevRank?

@donghao526 donghao526 marked this pull request as draft September 18, 2025 12:10
@LindaSummer
Copy link
Member

Hi @donghao526 ,

Sorry to interrrupt you.
Do you still work on this? I want to continue work on this draft PR if you don't mind. 😊

Best Regards,
Edward

@donghao526
Copy link
Author

donghao526 commented Oct 25, 2025

Hi @donghao526 ,

Sorry to interrrupt you. Do you still work on this? I want to continue work on this draft PR if you don't mind. 😊

Best Regards, Edward

Hi @LindaSummer ,

Sorry, I was very busy in the past two months. I'm still work on this, and I will submit new patch within the a week.

@LindaSummer
Copy link
Member

Hi @donghao526 ,
Sorry to interrrupt you. Do you still work on this? I want to continue work on this draft PR if you don't mind. 😊
Best Regards, Edward

Hi @LindaSummer ,

Sorry, I was very busy in the past two months. I'm still work on this, and I will submit new patch within the a week.

Hi @donghao526 ,

Thanks very much for your effort and contribution! ❤

You could do this anytime when you have time.

Your personal affairs are always in first priority and we don't need to do it in a rush. 😊

Best Regards,
Edward

@donghao526 donghao526 marked this pull request as ready for review October 25, 2025 15:15
@sonarqubecloud
Copy link

@LindaSummer LindaSummer requested a review from Copilot October 26, 2025 14:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements the TDIGEST.REVRANK command, which returns the reverse rank (number of elements greater than the given value) for one or more values in a t-digest data structure. The implementation includes support for handling edge cases like empty digests, values outside the range, and duplicate values.

Key changes:

  • Adds the TDigestRevRank algorithm to compute reverse ranks for input values
  • Refactors merge logic by extracting a reusable mergeNodes method
  • Adds comprehensive test coverage in both Go integration tests and C++ unit tests

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/types/tdigest.h Defines the TDigestRevRank template function and helper for rank assignment
src/types/redis_tdigest.h Declares the RevRank and mergeNodes methods in the TDigest class
src/types/redis_tdigest.cc Implements RevRank, refactors mergeNodes, and fixes iterator validation logic
src/commands/cmd_tdigest.cc Adds the CommandTDigestRevRank command handler and registration
tests/cppunit/types/tdigest_test.cc Adds C++ unit tests for various RevRank scenarios
tests/gocase/unit/type/tdigest/tdigest_test.go Adds Go integration tests for TDIGEST.REVRANK command

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// because we must guarantee the iterator to be inside the valid range before iteration.
bool Prev() {
if (Valid() && iter_ != centroids_.cbegin()) {
if (Valid()) {
Copy link

Copilot AI Oct 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After calling std::advance(iter_, -1), the iterator could become invalid (move before cbegin()). The validity check on line 79 will return true even when iter_ is before cbegin(), allowing subsequent operations on an out-of-bounds iterator. The validity check should occur before the advance operation, and the advance should only happen if iter_ != centroids_.cbegin().

Suggested change
if (Valid()) {
if (Valid() && iter_ != centroids_.cbegin()) {

Copilot uses AI. Check for mistakes.
Copy link
Member

@LindaSummer LindaSummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @donghao526 ,

Thanks very much for your effort! ❤
Left with some comments.

Please also go thorugh the comments left by github-copilot.

Best Regards,
Edward

return Valid();
}
bool Valid() const { return iter_ != centroids_.cend(); }
bool Valid() const { return iter_ < centroids_.cend() && iter_ >= centroids_.cbegin(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that iter_ should always greater or equal to the cbegin().

}
*output = redis::Array(rev_ranks);
} else {
*output = redis::BulkString("nan");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer to tdigest.revrank.

All values are -2 if the sketch is empty.

inline Status TDigestRevRank(TD&& td, const std::vector<double>& inputs, std::vector<int>& result) {
std::map<double, std::vector<size_t>> value_to_indices;
for (size_t i = 0; i < inputs.size(); ++i) {
value_to_indices[inputs[i]].push_back(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is hard to compare two double number for a map. we need a stable way for the compare operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TDigest: Implement TDIGEST.REVRANK command

8 participants